32 research outputs found
Computing Functions of Random Variables via Reproducing Kernel Hilbert Space Representations
We describe a method to perform functional operations on probability
distributions of random variables. The method uses reproducing kernel Hilbert
space representations of probability distributions, and it is applicable to all
operations which can be applied to points drawn from the respective
distributions. We refer to our approach as {\em kernel probabilistic
programming}. We illustrate it on synthetic data, and show how it can be used
for nonparametric structural equation models, with an application to causal
inference
Singular Value Decomposition of Operators on Reproducing Kernel Hilbert Spaces
Reproducing kernel Hilbert spaces (RKHSs) play an important role in many
statistics and machine learning applications ranging from support vector
machines to Gaussian processes and kernel embeddings of distributions.
Operators acting on such spaces are, for instance, required to embed
conditional probability distributions in order to implement the kernel Bayes
rule and build sequential data models. It was recently shown that transfer
operators such as the Perron-Frobenius or Koopman operator can also be
approximated in a similar fashion using covariance and cross-covariance
operators and that eigenfunctions of these operators can be obtained by solving
associated matrix eigenvalue problems. The goal of this paper is to provide a
solid functional analytic foundation for the eigenvalue decomposition of RKHS
operators and to extend the approach to the singular value decomposition. The
results are illustrated with simple guiding examples
Large-scale Nonlinear Variable Selection via Kernel Random Features
We propose a new method for input variable selection in nonlinear regression.
The method is embedded into a kernel regression machine that can model general
nonlinear functions, not being a priori limited to additive models. This is the
first kernel-based variable selection method applicable to large datasets. It
sidesteps the typical poor scaling properties of kernel methods by mapping the
inputs into a relatively low-dimensional space of random features. The
algorithm discovers the variables relevant for the regression task together
with learning the prediction model through learning the appropriate nonlinear
random feature maps. We demonstrate the outstanding performance of our method
on a set of large-scale synthetic and real datasets.Comment: Final version for proceedings of ECML/PKDD 201
Quantum mean embedding of probability distributions
The kernel mean embedding of probability distributions is commonly used in
machine learning as an injective mapping from distributions to functions in an
infinite dimensional Hilbert space. It allows us, for example, to define a
distance measure between probability distributions, called maximum mean
discrepancy (MMD). In this work, we propose to represent probability
distributions in a pure quantum state of a system that is described by an
infinite dimensional Hilbert space. This enables us to work with an explicit
representation of the mean embedding, whereas classically one can only work
implicitly with an infinite dimensional Hilbert space through the use of the
kernel trick. We show how this explicit representation can speed up methods
that rely on inner products of mean embeddings and discuss the theoretical and
experimental challenges that need to be solved in order to achieve these
speedups.Comment: 7 pages, 2 figure
Kernel Mean Estimation via Spectral Filtering
The problem of estimating the kernel mean in a reproducing kernel Hilbert
space (RKHS) is central to kernel methods in that it is used by classical
approaches (e.g., when centering a kernel PCA matrix), and it also forms the
core inference step of modern kernel methods (e.g., kernel-based non-parametric
tests) that rely on embedding probability distributions in RKHSs. Muandet et
al. (2014) has shown that shrinkage can help in constructing "better"
estimators of the kernel mean than the empirical estimator. The present paper
studies the consistency and admissibility of the estimators in Muandet et al.
(2014), and proposes a wider class of shrinkage estimators that improve upon
the empirical estimator by considering appropriate basis functions. Using the
kernel PCA basis, we show that some of these estimators can be constructed
using spectral filtering algorithms which are shown to be consistent under some
technical assumptions. Our theoretical analysis also reveals a fundamental
connection to the kernel-based supervised learning framework. The proposed
estimators are simple to implement and perform well in practice.Comment: To appear at the 28th Annual Conference on Neural Information
Processing Systems (NIPS 2014). 16 page